Facilitated diffusion of DNA-binding proteins: Efficient simulation with 
the method of excess collisions (MEC) 
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In this paper, a new method to efficiently simulate diffusion controlled second order chemical 
reactions is derived and applied to site-specific DNA-binding proteins. The protein enters a spherical 
cell and propagates via two competing modes, a free diffusion and a DNA-sliding mode, to search for 
its specific binding site in the center of the cell. There is no need for a straight forward simulation 
of this process. Instead, an alternative and exact approach is shown to be essentially faster than 
explicit random- walk simulations. The speed-up of this novel simulation technique is rapidly growing 
with system size. 

PACS numbers: 87.16.Ac 
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INTRODUCTION 



Diffusion controlled bio-chemical reactions play a cen- 
tral role in keeping any organism alive [jj, |2| : The trans- 
port of molecules through cell membranes, the passage 
of ions across the synaptic gap, or the search carried out 
by drugs on the way to their protein receptors are pre- 
dominantly diffusive processes. Further more, essentially 
all of the biological functions of DNA are performed by 
proteins that interact with specific DNA sequences 0,01, 
and these reactions are diffusion-controlled. 

However, it has been realized that some proteins are 
able to find their specific binding sites on DNA much 
more rapidly than is 'allowed' by the diffusion limit 0,01. 
It is therefore generally accepted that some kind of facil- 
itated diffusion must take place in these cases. Several 
mechanisms, differing in details, have been proposed. All 
of them essentially involve two steps: the binding to a 
random non-specific DNA site and the diffusion (sliding) 
along the DNA chain. These two steps may be reiter- 
ated many times before proteins actually find their tar- 
get, since the sliding is occasionally interrupted by dis- 
sociation. Berg 5| and Zhou have provided thorough 
(but somewhat sophisticated) theories that allow esti- 
mates for the resulting reaction rates. Recently, Halford 
has presented a comprehensive review on this subject and 
proposed a remarkably simple and semiquantitative ap- 
proach that explicitly contains the mean sliding length 
as a parameter of the theory |j| ■ This approach has been 
refined and put onto a rigorous base in a recent work by 
the authors @|. 

Although analytical models provide a good general un- 
derstanding of the problem, they fail to give quantitative 
predictions for systems of realistic complexity. There- 
fore, numerical simulations are required to calibrate the 
set of parameters that form the backbone of these mod- 



els. However, a straight forward simulation of a protein 
searching through mega-bases of non-target DNA to find 
its specific binding site would be prohibitive for all ex- 
cept for the most simple numerical models. Fortunately, 
there are better ways. Two of the authors (KK and JL) 
have recently introduced the method of excess collisions 
(MEC) for an efficient simulation of intramolecular reac- 
tions in polymers Q. In the present work, this method 
is modified to apply to second order diffusion controlled 
chemical reactions (Section l2.1[) . We thereby construct a 
simple random walk approach to facilitated diffusion of 
DNA-binding proteins (Section l2~2l and apply the MEC 
and our analytical estimate for reaction times to this 
model (Section 12.31 and I2.4f> . Section [3] provides details 
about the generation of DNA-chains, followed by a set of 
simulations covering a large range of system dimensions 
(Section to verify the performance of the MEC. 



2. THEORY 
2.1. Method of excess collisions (MEC) 

We consider a (time-homogeneous) stochastic process. 
The problem is to find the average time tba of the first 
arrival at a certain state A, provided that, at time t = 0, 
the system occupied another state B. 

Suppose we observe the system for a long time inter- 
val T and monitor the events of entering state A. These 
events will be referred to as collisions. Each collision 
that occurs for the first time after visiting state B will 
be called prime collision. We obtain the (asymptotically 
correct for T — > oo) relation 



T = n{T)T R = n'{T)r' R 



(1) 
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where n(T) and n'(T) are the average numbers of all 
and of prime collisions during the time interval T, re- 
spectively, and tr and t' r are the corresponding mean 



2 



recurrence times. Hence, 



n(T) 
n'{T) 



t r = Nt r 



(2) 



The ratio N = n(T)/n'(T) defines the average number of 
collisions between two visits to state B and does actually 
not depend on T, once T is chosen sufficiently large. The 
mean recurrence time t' r of prime collisions is simply the 
average time the system requires to move from state A 
to B and back from state B to A: 



tab + tea , 



(3) 



where tab is the mean time of first arrival at state B 
starting from A. With eq. J2J we then obtain 



tba = Nt r — tab 



(4) 



This relation is useful for the numerical estimation of 
tba if tba ^ tab ■ A simulation cycle then starts in state 
A and ends as soon as state B is reached, i.e. the reversed 
reaction A — > B is simulated in order to obtain the (much 
lower) reaction rate of the original reaction B — > A. In 
this case we can write 



N = (AW + 1 



(5) 



where (N co u} is the average number of collisions in a sim- 
ulation cycle and the second term accounts for the prime 
collision (which is not observed in the simulations, since 
the cycle starts at the time instant that immediately fol- 
lows the prime collision). As will be shown later in Sec- 
tion ^. 31 the recurrence time tr can be renormalized and 
computed efficiently inside a small test system. Note that 
eq. 10} can be written as 



r BA = (A^E + 1) TR , 



where 



(A r coii) 



tab 

TR 



(6) 



(7) 



is the mean number of excess collisions per simulation 
cycle 0, since the ratio tab/t r is just the mean number 
of collisions that would be observed in a simulation run 
of length tab with a starting point at an arbitrary state 
of the system (not necessary state A) . 



2.2. Simple model for facilitated diffusion of 
DNA-binding proteins 

We consider a spherical volume (cell) of radius R and 
inside it a worm-like chain (DNA) of length L and radius 
r c . The protein is represented as a random walker mov- 
ing inside the cell with a certain time step dt. A collision 
takes place once the walker enters the active binding site, 
a spherical volume of radius r a positioned in the middle 
of the chain that, in its turn, coincides with the center 



of the cell. We want to point out that the parameter r a 
does not necessary correspond to any geometrical length 
in the real system. It defines a probability for the re- 
action to take place, and may cover additional variables 
which are not included explicitly in the model, like pro- 
tein orientation and conformation. An attractive step 
potential is implemented as 



17(d) = 



-E 




d <r c 
d> TV 



(8) 



where d is the shortest distance between walker and 
chain. This defines a pipe with radius r c around the 
chain contour that the walker is allowed to enter freely 
from outside, but to exit only with the probability 



p = exjp(-E a /kBT) 



(9) 



where k^T is the Boltzmann factor, otherwise it is re- 
flected back inside the chain. We may therefore denote p 
as exit probability. It is important to note that p defines 
the equilibrium constant K of the two phases, the free 
and the non-specifically bound protein, according to 



c p 



(10) 



where c is the concentration of free proteins and a — 
cV c /(pL) is the linear density of proteins that are non- 
spccifically bound to the DNA, with V c — nr^L being 
the geometric volume of the chain. 



2.3. Method of computation of the recurrence time 

The two states of interest are the protein entering the 
cell, B, and the same protein reaching the active site in 
the center of the cell, A. More specifically, we are inter- 
ested in finding the time tba the walker requires to reach 
a distance r = r a when starting at distance r{t = 0) = R. 

We shall first define the excluded volume of the chain 



V m = 



1 — exp 



-U[d(r)] 
kBT 



dv = V r 



1 

1 - - 

P 



(11) 

where U(d) is the energy of the walker as defined by eq. 
(JSJ) and the integration is performed over the geometric 
volume of the cell, V — (4/3)7ri? 3 . The effective volume 
V e s of the cell is then 



eff 



v-v c , 



V + V C ( - 



(12) 



Next we assume that simulations were carried out within 
a small test system of radius R* < R and that the re- 
currence time t r of the walker was found. Its recurrence 
time in the larger system is then found as 



tr(V) = Tr Kff , 



(13) 
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where we have defined 



V 



(14) 



ctr 



This ratio does not depend on system size and may there- 
fore be called specific recurrence time. It only depends on 
the potential-depth E a and the step-size chosen for the 
random walk. The idea is to compute tr (as described 
in Section [3} for a small test system with dimensions 
of the order of r a (which is the radius of the specific 
binding site) to obtain tr for the system of interest us- 
ing eq. 113|) . Once tr is known, tab is computed via 
random walk simulations in the large system, starting at 
r(t = 0) = r a and terminating as soon as the periphery of 
the cell v(tab) = -R is reached. Following the trajectory 
of the walker, the number of collisions (N co \\) = N — 1 is 
monitored as well, so that eq. (QJ can be used to deter- 
mine the much longer reaction time tba- 



2.4. Analytical estimate for the collision time 

As has been discussed in detail elsewhere Q , it is possi- 
ble to estimate the reaction time for the protein using an 
analytical approach, once certain conditions are satisfied. 
The resulting expression is 



-(: 



V 



V8D 3d £ 4Z? ld 
with the 'sliding' variable 



2 (r a 

1 arctan — 

7r V i 



i = 



DwJ£ 
2irD 3d 



(15) 



(16) 



and Did and being the diffusion coefficients in 

sliding-mode and free diffusion, respectively. Generally, 
the equilibrium constant K has to be determined in sim- 
ulations of a (small) test system, containing a piece of 
chain without specific binding site @- In the present 
model, K is known analytically via eq. (|10[1 . If the step- 
size dr of the random walker is equal both inside and 
outside the chain (the direction of the step being arbi- 
trary), we further have -Did = D 3d = dr 2 /6, and hence 
obtain 



(17) 



This variable has got the dimension of length; as we have 
pointed out in [8J, it corresponds to the average sliding 
length of the protein along the DNA contour in Halford's 
model 7]. In this light, a (non rigorous) interpretation of 
eq. (|15l) is as follows: The first term in the round brackets 
represents the time of free diffusion of the walker, whereas 
the second term stands for the time of one-dimensional 
sliding. With increasing affinity of the walker to the chain 
(expressed as a reduced value for the exit probability p) , 



the sliding variable £ increases and the contribution of 
free diffusion to the reaction time (first term in I15fl be- 
comes less significant. At the same time, the second term 
of eq. H15|) is growing. Depending on the choice of system 
parameters, there may be a turning point where the lat- 
ter contribution over-compensates the former, so that the 
total reaction time increases once £ is growing further. 

For a random walk model as simple as used here, 
this analytical formula describes the reaction times well 
within 10% tolerance, as long as the following conditions 
are satisfied: (1) £ < J?, i.e. the sliding parameter should 
be small compared to the system size. This restriction 
assures the correct normalization of the protein's prob- 
ability distributions and the diffusion efficiencies as dis- 
cussed in (2) During the diffusion process, the system 
reaches its equilibrium, so that the constant K repre- 
sents the average times the protein spends in free and 
in non-spccifically bound mode. This requires either a 
crowded environment (the chain-density inside the cell is 
high enough) or a reasonably small value for £, since the 
initial position of the walker is always at the periphery 
and outside the chain, i.e. not in equilibrium. (3) £ < l v , 
where l p is the persistence length of the chain. This 
restriction accounts for the assumption that the walker 
moves along an approximately straight line during one 
sliding period. However, numerical tests have shown that 
deviations from a straight geometry actually have little 
impact to the accuracy of the model. (4) The step-size 
of the random walk has to be small compared to the size 
of the binding site. 

It should be pointed out that an analytical approach 
as simple as that is by no means supposed to simulate 
the actual situation in a living cell. Instead, it serves as 
a platform for a much wider class of semi-empirical mod- 
els. The sliding-parameter £ contains the affinity of non- 
specific protein-DNA binding and is flexible to vary with 
the potential chosen for the simulation. The diffusion co- 
efficients -Did and D 3d can be adapted to experimental 
measurements, and the target size r a contains protein- 
specific reaction probabilities. These parameters can be 
fitted to either describe system-specific experimental re- 
sults or the output of more sophisticated numerical codes 
which would otherwise not permit any analytical treat- 
ment. 



3. NUMERICAL MODEL 

In order to approximate the real biological situation, 
the DNA was modeled by a chain of straight segments of 
equal length ^o- Its mechanical stiffness was defined by 
the bending energy associated with each chain joint: 



(18) 



where a represents the dimensionless stiffness parame- 
ter, and 9 the bending angle. The numerical value of a 
defines the persistence length (l p ), i.e. the "stiffness" of 
the chain. The excluded volume effect was taken into 
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account by introducing the effective chain radius r c . The 
conformations of the chain, with distances between non- 
adjacent segments smaller than r c , were forbidden. The 
target of specific binding was assumed to lie exactly in 
the middle of the DNA. The whole chain was packed in 
a spherical volume (cell) of radius R in such a way that 
the target occupied the central position. 

To achieve a close packing of the chain inside the cell, 
we used the following algorithm. First, a relaxed confor- 
mation of the free chain was produced by the standard 
Metropolis Monte-Carlo (MC) method. For the further 
compression, we defined the center-norm (c-norm) as the 
maximum distance from the target (the middle point) to 
the other parts of the chain. Then, the MC procedure 
was continued with one modification. Namely, a MC step 
was rejected if the c-norm was exceeding 105% of the low- 
est value registered so far. The procedure was stopped 
when the desired degree of compaction was obtained. 

The protein was modeled as a random walker within 
the cell with reflecting boundaries. During one time-step 
it was displaced by the distance dr in a random direc- 
tion. Once approaching the chain closer than its radius 
r c defining the "non-specific binding pipe" , it was al- 
lowed to enter it freely and continue its random walk 
inside. Upon crossing the pipe boundary from inside, it 
was either allowed to pass with the exit probability p or 
otherwise reflected back inside, as described in Section 

Below in this paper, one step dt was chosen as the unit 
of time and one persistence length l p — 50 nm of the 
DNA chain as the unit of distance. The following values 
of parameters were used. The length of one segment was 
chosen as Iq = 0.2, so that one persistence length was 
partitioned into 5 segments. The corresponding value of 
the stiffness parameter was a = 2.403 ^Jj. The chain 
radius was r c = 0.06, and the active site was modeled as 
a sphere of identical radius r a = 0.06 embedded into the 
chain. The step-size of the random walker both inside 
and outside the chain was dr = 0.02, corresponding to 
a diffusion coefficient — -Did — dr 2 /Q = 2 • 10~ 4 /3. 
This choice was a compromise between accuracy and sim- 
ulation time. Tests have confirmed that a smaller step- 
size could somewhat reduce the gap between theoretical 
fea. H5fl and simulated reaction time at small values of £. 



4. COMPUTATION OF THE SPECIFIC 
RECURRENCE TIME 

To compute the specific recurrence time tr of eq. Q14JI. 
a very small test system is sufficient. Moreover, the com- 
putations can be carried out for the collisions from within 
the specific binding site of radius r a 9J. The entire sys- 
tem, i.e. the sphere and a short piece of chain, was embed- 
ded into a cube of 4r a side- length with reflective walls. 
In principle, the size of the cube should be of no rele- 
vance, but it was found that, if chosen too small, effects 
of the finite step-size were emerging. The walker started 



inside the sphere. Each time upon leaving the spherical 
volume a collision was noted. If the walker was about to 
exit the cylindrical volume of the chain, it was reflected 
back inside with the probability 1 — p. The clock was 
halted as long as the walker moved outside the sphere 
and only counted time-steps inside the sphere. Since the 
binding site was embedded into the chain, its effective 
volume (ea. I12fl was simply V e g = V a /p, with V a being 
the volume of the specific binding site. 

Table[I]contains the results for 12 different values of the 
exit probability p. The recurrence time t r does in fact 
depend on p, although the spherical volume V a is fully 
embedded into the chain. The reason is that within one 
time-step, the walker may leave the sphere, but, depend- 
ing on p, subsequently reflected back from the chain's 
periphery into the binding site. Such a move is not ac- 
counted as a collision (there are no fractional time-steps). 
The computational cost of these simulations is negligible 
— Millions of cycles are carried out within minutes on a 
PC, and the statistical error of tr can be made negligibly 
small. 



5. MODEL SYSTEMS OF VARIOUS SIZES 

Next, simulations were carried out for cells of different 
volumes Vi = 4ir i?f/3 (see table^for a summary of the 
system parameters) . The chain lengths Li were chosen so 
that the density Li/Vi remained of the same order around 
3/4. First, the chain conformation was generated using 
the procedure of Section |3J Then, each simulation cycle 
started at the periphery of the active binding site (state 
A) and ended as soon as the periphery of the cell (state 



TABLE I: Recurrence time (3rd column) inside the spherical 
binding site (R = r a ), specific recurrence time eq. 11411 (4th 
column), and simulation results for the large system (R — 4.8, 
column 5-9). The first column is the exponent of the exit 
probability p = 2~ l , the second column the corresponding 
sliding parameter eq. 1171 . The last column defines the speed- 
up achieved with the MEC approach. 
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FIG. 1: A 'cell' of radius R = 4.8 (persistence lengths) con- 
taining a chain of L = 345.8, corresponding to 240 nm and 
17.3 /im, respectively. The chain was made of 1729 segments. 
The protein's specific binding site is located at the center (dot, 
not to scale). 



B) was reached. Whenever the walker returned back to 
the binding site (r < r a ), one collision was noted. As 
long as the walker remained inside the binding site, the 
clock was halted. For each value of the exit parameter p, 
which is related to the walker-chain affinity via eq. 
2000 cycles were carried out and the measurements were 
averaged, so that statistical fluctuations were reduced to 
about 2%. The simulations provided measurements of 
tab, the average time to reach B when starting from 

A, and (iV co ii), the number of returns to A on the way 
towards B. Equations l|13(l and Q}, which form the core 
of the MEC approach, were then applied to evaluate tba- 
Additionally, tba was simulated explicitly, starting from 

B, as a verification of the speed-up and accuracy of the 
MEC approach. The results are summarized in table ITT1 
In order to clarify the procedure, we shall first discuss 
the simulation of the largest cell R — 4.8 in more detail. 

Figure ^ displays the chain conformation inside the 
spherical cell in a 2-dimensional projection. The specific 
binding site is located at the center of the cell. Note that, 
wherever possible, the chain contour, constructed of 1729 
cylindrical segments, tries to avoid large bond angles, a 
result of the bending potential as discussed in Section 

Table U contains details of the simulation results for 12 
different values of the exit parameter p, varied asp = 2~ l , 
I = 0, . . . , 11. The second column is the sliding param- 
eter eq. <|17fl . With increasing protein-chain affinity, the 
walker is spending more time inside the chain volume so 
that the sliding parameter is growing in size, reaching 



FIG. 2: The first reaction time tba for the cell of radius R = 
4.8 persistence lengths as a function of the sliding parameter 
£. Dots: explicit simulation. Squares: MEC approach, which 
is exact within statistical errors. Speed-up: Factor 33.4 of 
simulation steps after integration over all data points. The 
solid curve is the analytical estimate eq. 11511 . 



a value of almost two persistence lengths at p = 2 . 
The following two columns are the recurrence time t r 
and Tr as discussed in Sec. 0J The next column is the 
number of collisions N (eq. [SJ . The more time it spends 
inside the chain contour, i.e. with increasing influence of 
facilitated diffusion, the more often the walker returns 
back to state A to cause a collision, before being able to 
reach state B for the first time to finish the cycle. From 
p = 1 (free diffusion) to p = 2~ n , the value of N gains 
almost two orders of magnitude. The next column is the 
average reaction time tab of the direction A — > B. This 
quantity initially remains almost constant, but at higher 
values of protein-chain affinity it begins to grow rapidly. 
The reason is because the walker becomes more and more 
trapped inside the chain volume and is unable to access 
the cell periphery as effectively as it does during free 
diffusion. The next column is the reaction time tba of 
the reaction B — > A as delivered by the MEC approach 
using eq. Jlj. The recurrence time tr was determined 
using eq. I|13[) . with the effective volume of eq. i|12|) and 
the specific recurrence time (column 4). The next 
column contains tba as obtained by direct simulations. 
When averaged over all data points, both results for tba 
differed by 2.4%. As shown in the last column, the ratio 
tba/tab was of the order 10-100. This defines the speed- 
up of the MEC approach over the explicit simulation of 
tba- Integrated over all data points, the total speed-up 
was equal to 33.4. 
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Figure[3]displays the first reaction times tba as a func- 
tion of the sliding parameter £. Both methods (explicit 
simulation and MEC approach) deliver identical results 
within the statistical errors. The solid curve is a plot 
of the analytical estimate eq. (|15() . which consistently 
under-estimates the first reaction time by 5-10% but oth- 
erwise describes the trends accurately, including the lo- 
cation of the minimum. The results prove that facili- 
tated diffusion is able to accelerate the reaction consid- 
erably. It is also obvious that a very high affinity of the 
protein to the chain becomes counter-productive: The 
walker spends long periods of time trapped within a par- 
ticular loop of the chain without being able to explore 
the remaining parts of the cell exhaustively. Ideally, the 
affinity has to be chosen so that the walker is occasionally 
able to dissociate from the chain and associate again af- 
ter having passed some time in free diffusion. The actual 
value of the ideal affinity depends on the system param- 
eters and is easily estimated using eq. (|15f) prior to any 
simulations. 

Table [H] contains a summary of the simulation results 
for various system sizes. It appears that the speed-up 
delivered by the MEC approach increased proportional 
to the square of the cell radius, and gained a significant 
dimension in the largest of our test systems. Whereas a 
cell as small as R = 1.2 was treated within 30 minutes on 
a PC, including 2000 runs of explicit simulation B — > A 
for 12 different values of the exit probability p, the large 
cell of R — 4.8 required more than 5 days for the same set 
of computations. The MEC method reduced that time 
to less than four hours. 



6. SUMMARY 

In this work, the method of excess-collisions (MEC), 
recently introduced as a technique to speed up the sim- 
ulation of intramolecular reactions in polymers, is gen- 
eralized to second order diffusion controlled reactions, 
and applied to the problem of facilitated diffusion of site- 
specific DNA-binding proteins. This method is based on 
eq. Q and (|13f) to simulate the much faster back-reaction 
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TABLE II: Simulation parameters (cell radius R, chain length 
L) and total speed-up. Column 3 contains the total number 
of time-steps n(BA) (integrated over all data points) for the 
explicit simulation of tba, column 4 is the integrated speed- 
up of MEC (the ratio n(BA)/n(AB)). The last column con- 
tains the deviation (averaged over all data points) between 
tba (explicit) and tba (MEC). 

Cell R Chain L Time-steps Speed-up Error (%) 
1.2 5A0 1.68 • 10 9 22 3~I) 
2.0 25.0 9.31 • 10 9 6.9 3.9 
3.2 102.6 4.17 ■ 10 10 16.6 2.3 
4.8 345.8 1.44 ■ 10 11 33.4 2.4 

A — » B (protein starts at the binding site and propagates 
to the cell-periphery) instead of B — > A. We have demon- 
strated how MEC led to a speed-up of up to two orders of 
magnitude, depending on protein-DNA affinity (Tabled, 
and gaining significance with increasing cell size (Table 

The cell model employed in this work was perhaps the 
most simple ansatz that was possible without being triv- 
ial, and intentionally so. The simulations had to cover 
a large range of system sizes in order to verify the ef- 
ficiency of the MEC approach. The chain-lengths span 
a factor of 64 from the smallest to the largest system. 
Nevertheless, the validity of our results does not depend 
on the complexity of the model, such as protein-DNA 
potential, which modifies the equilibrium constant K in 
eq. (110(1 and thereby the sliding parameter £ (eq. Ilfijl . 
hydrodynamic interactions, which would lead to effective 
diffusion coefficients, also modifying £, or the introduc- 
tion of protein orientation and conformation, acting on 
the effective target size r a . The speed-up is consistently 
evaluated in terms of simulation steps, not CPU-time, 
to ensure invariance on the complexity of the underly- 
ing protein/DNA model. Based on the results presented 
here, the MEC approach can be expected to reduce the 
numerical effort by orders of magnitude, once more so- 
phisticated (and time consuming) simulation techniques 
are employed to study biochemical reaction times in sys- 
tems of realistic dimensions. 
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